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Abstract Preference queries are relational algebra or SQL queries that 
contain occurrences of the winnow operator (find the most preferred tu- 
ples in a given relation). We present here a number of semantic optimiza- 
tion techniques applicable to preference queries. The techniques make it 
possible to remove redundant occurrences of the winnow operator and 
to apply a more efficient algorithm for the computation of winnow. We 
also study the propagation of integrity constraints in the result of the 
winnow. We have identified necessary and sufficient conditions for the 
applicability of our techniques, and formulated those conditions as con- 
straint satisfiability problems. 



1 Introduction 

The notion of preference is becoming more and more ubiquitous in present-day 
information systems. Preferences are primarily used to filter and personalize the 
information reaching the users of such systems. In database systems, preferences 
are usually captured as preference relations that are used to build preference 
queries [Cho02,Cho03,Kie02,KK02]. From a formal point of view, preference re- 
lations are simply binary relations defined on query answers. Such relations 
provide an abstract, generic way to talk about a variety of concepts like prior- 
ity, importance, relevance, timeliness, reliability etc. Preference relations can be 
defined using logical formulas [Cho02,Cho03] or special preference constructors 
[Kie02] (preference constructors can be expressed using logical formulas). The 
embedding of preference relations into relational query languages is typically 
provided through a relational operator that selects from its argument relation 
the set of the most preferred tuples, according to a given preference relation. This 
operator has been variously called winnow (the term we use here) [Cho02,Cho03], 
BMO [Kie02], and Best [TC02]. (It is also implicit in skyline queries [BKS01].) 
Being a relational operator, winnow can clearly be combined with other rela- 
tional operators, in order to express complex preference queries. 

Example 1. We introduce an example used throughout the paper. Consider the 
relation Book(ISBN, Vendor, Price) and the following preference relation >~Ci 
between Book tuples: 
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prefer one Book tuple to another if and only if their ISBNs are the same 
and the Price of the first is lower. 



Consider the instance n of Book in Figure 1. Then the winnow operator luc\ 
returns the set of tuples in Figure 2. 



ISBN 


Vendor 


Price 


0679726691 
0679726691 
0679726691 
0062059041 
0374164770 


BooksForLess 

LowestPrices 

QualityBooks 

BooksForLess 

LowestPrices 


$14.75 

$13.50 

$18.80 

$7.30 

$21.88 


Figurel. The Book relation 
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0679726691 
0062059041 
0374164770 


LowestPrices 

BooksForLess 

LowestPrices 


$13.50 

$7.30 

$21.88 



Figure2. The result of winnow 



Example 2. The above example is a one-dimensional skyline query. To see an 
example of a two-dimensional skyline, consider the schema of Book expanded by 
another attribute Rating. Define the following preference relation C2: 

prefer one Book tuple to another if and only if their ISBNs are the same 
and the Price of the first is lower and the Rating of the first is not lower, 
or the Price of the first is not higher and the Rating of the first is higher. 

Then uc 2 is equivalent to the following skyline (in the terminology of [BKS01]): 

SKYLINE ISBN DIFF, Price MIN, Rating MAX. 

The above notation indicates that only books with the same ISBN should be 
compared, that Price should be minimized, and Rating maximized. In fact, the 
tuples in the skyline satisfy the property of Pareto-optimality, well known in 
economics. 

Preference queries can be reformulated in relational algebra or SQL, and thus 
optimized and evaluated using standard relational techniques. However, it has 
been recognized that specialized evaluation and optimization techniques promise 
in this context performance improvements that are otherwise unavailable. A 
number of new algorithms for the evaluation of skyline queries (a special class 



of preference queries) have been proposed [BKS01,CGGL03,KRR02,PTFS03]. 
Some of them can be used to evaluate general preference queries [Cho03]. Also, 
algebraic laws that characterize the interaction of winnow with the standard 
operators of relational algebra have been formulated [Cho03,KH02,KH03]. Such 
laws provide a foundation for the rewriting of preference queries. For instance, 
necessary and sufficient conditions for pushing a selection through winnow are 
described in [Cho03]. The algebraic laws cannot be applied unconditionally. In 
fact, the preconditions of their applications refer to the validity of certain con- 
straint formulas. 

In this paper, we pursue the line of research from [Cho03] a bit further. Wc 
study semantic optimization of preference queries. Semantic query optimization 
has been extensively studied for relational and deductive databases [CGM90]. As 
a result, a body of techniques dealing with specific query transformations like join 
elimination and introduction, predicate introduction etc. has been developed. 
We view semantic query optimization very broadly and classify as semantic any 
query optimization technique that makes use of integrity constraints. In the 
context of preference queries, we focus on the winnow operator. Despite the 
presence of specialized evaluation techniques, winnow is still quite an expensive 
operation. We develop optimizing techniques that: 

1. remove redundant occurrences of winnow; 

2. recognize when more efficient evaluation of winnow is possible. 

More efficient evaluation of winnow can be achieved, for example, if the given 
preference relation is a weak order (a negatively transitive strict partial order). 
We show that even when the preference relation is not a weak order (as in Ex- 
ample 1), it may become equivalent to a weak order on the relations satisfying 
certain integrity constraints. We show a very simple, single-pass algorithm for 
evaluating winnow under those conditions. We also pay attention to the issue 
of satisfaction of integrity constraints in the result of applying winnow. In fact, 
some constraints may hold in the result of winnow, even though they do not 
hold in the relation to which winnow is applied. Combined with known results 
about the preservation of integrity constraints by relational algebra operators 
[Klu80,KP82], our results provide a way for optimizing not only single occur- 
rences of winnow but also complex preference queries. As in the case of the 
algebraic transformations described in [Cho03], the semantic transformations 
described in this paper have preconditions referring to the validity of certain 
constraint formulas. Thus, such preconditions can be checked using well estab- 
lished constraint satisfaction techniques [GSW96] 1 . 

The plan of the paper is as follows. In Section 2 we define basic notions. We 
limit ourselves here to integrity constraints that are functional dependencies. In 
Section 3 we address the issue of eliminating redundant occurrences of winnow. 
In Section 4 we study weak orders. In Section 5 we characterize dependencies 
holding in the result of winnow. In Section 6 we show how our results can be 



1 A formula is valid iff its negation is unsatisfiablc. 



generalized to constraint-generating dependencies [BCW99]. We briefly discuss 
related work in Section 7 and conclude in Section 8. 

2 Basic notions 

We are working in the context of the relational model of data. For concreteness, 
we consider two infinite domains: T> (uninterpreted constants) and Q (ratio- 
nal numbers). Other domains could be considered as well without influencing 
most of the results of the paper. We assume that database instances are finite. 
Additionally, we have the standard built-in predicates. 

2.1 Preference relations 

Definition 1. Given a relation schema R(Ai ■ ■ ■ Ak) such that Ui, 1 < i < k, 
is the domain (either V or Q) of the attribute Ai, a relation >~ is a preference 
relation over R if it is a subset of (Ui x • • • x Uk) x {U\ x • • • x Uk). 

Intuitively, >- will be a binary relation between tuples from the same (database) 
relation. We say that a tuple t\ dominates a tuple ti in >- if t\ >- ti. 
Typical properties of the relation >- include: 

— irreflexivity: Vx. x )f x, 

— asymmetry: Vx, y.x^y^y^x, 

— transitivity: Vx, y, z. (x>-y/\y)~z)=>x>-z, 

— negative transitivity: Vx, y, z. (x )f y A y )f z) =>■ x )f z, 

— connectivity: Vx, y.x>-y\Jy>-x\fx = y. 

The relation >- is: 

— a strict partial order if it is irreflexive and transitive (thus also asymmetric); 

— a weak order if it is a negatively transitive strict partial order; 

— a total order if it is a connected strict partial order. 

At this point, we do not assume any properties of >~, although in most ap- 
plications it will satisfy at least the properties of strict partial order. 

Definition 2. A preference formula (pf) C{ti 1 t 2 ) is a first- order formula defin- 
ing a preference relation >c i n the standard sense, namely 

ti >ct 2 iff C(ti,i 2 ). 

An intrinsic preference formula (ipf ) is a preference formula that uses only built- 
in predicates. 

We will limit our attention to preference relations defined using intrinsic 
preference formulas. 

Because we consider two specific domains, V and Q, we will have two kinds 
of variables, D-variables and Q-variables, and two kinds of atomic formulas: 



— equality constraints: x — y, x ^ y, x = c, or x ^ c, where x and y are 
I?- variables, and c is an uninterpreted constant; 

— rational- order constraints: xOy or x8c, where 9 e {=, ^, <, >, <, >}, x and 
y are Q-variables, and c is a rational number. 

Without loss of generality, we will assume that ipfs are in DNF (Disjunctive 
Normal Form) and quantifier-free (the theories involving the above domains 
admit quantifier elimination). We also assume that atomic formulas are closed 
under negation (also satisfied by the above theories). An ipf whose all atomic 
formulas are equality (resp. rational-order) constraints will be called an equality 
(resp. rational- order) ipf. Clearly, ipfs are a special case of general constraints 
[KLPOO], and define fixed, although possibly infinite, relations. By using the 
notation >~c for a preference relation, we assume that there is an underlying 
preference formula C . 

Definition 3. Given an instance r of R and a preference relation >c over R, 
the restriction >c\r of^c to r is defined as 

>c\r n r x r. 

2.2 Winnow 

We define now an algebraic operator that picks from a given relation the set of 
the most preferred tuples, according to a given preference formula. 

Definition 4. If R is a relation schema and C a preference formula defining a 
preference relation >~c over R, then the winnow operator is written as uoc{R), 
and for every instance r of R: 

uc{r) = {ter\ -at' e r. t' y c t}. 

A preference query is a relational algebra query containing at least one oc- 
currence of the winnow operator. 

Example 3. Consider the relation Book(ISBN , Vendor, Price) (Example 1). The 
preference relation >~Ci from this example can be defined using the formula C\: 

(i,v,p) y Cl {i',v',p') = i = i' hp < p'. 

The answer to the preference query wp 1 {Book) provides for every book the 
information about the vendors offering the lowest price for that book. 

2.3 Indifference 

Every preference relation >~c generates an indifference relation two tuples 
t\ and ti are indifferent {t\ <~c ^2) if neither is preferred to the other one, i.e., 
ti )/- c h and t 2 )fc h- 

Proposition 1. For every preference relation >~c , every relation r and every 
tuple ti,t 2 6 uJc{ r ), ™e have t\ = t 2 or t\ 



2.4 Functional dependencies 



We assume that we are working in the context of a single relation schema and 
all the integrity constraints are over that schema. The set of all instances of R 
satisfying a set of integrity constraints F is denoted as Sat(F). We say that F 
entails an integrity constraint / if every instance satisfying F also satisfies /. 

A functional dependency (FD) / = X — ► Y, where X and Y are sets of 
attributes of R can be written down as the following logic formula: 

Vti.Vt 2 . [R(t 1 )AR(t 2 )At 1 [X} = t 2 [X]}^t 1 [Y]=t 2 [Y]. 

We use the following notation: 

p/(ti,t 2 ) = h[X] = t 2 [X] => h[Y] = t 2 [Y}. 

For a set of FDs F, we define 

<Pf= /\ ff- 
feF 

The arity of an FD / = X — ► Y is the cardinality \X U Y\ of the set of 
attributes X UY. The arity of a set of FDs F is the maximum arity of any FD 
in F. 

Note that the set of attributes X in X — > Y may be empty, meaning that 
each attribute in Y can assume only a single value. 

3 Eliminating redundant occurrences of winnow 

Given an instance r of R, the operator uoc is redundant if wpfr) = r. If we 
consider the class of all instances of i?, then such an operator is redundant for 
every instance iff >~c is an empty relation. The latter holds iff C is unsatisfiable. 
However, we are interested only in the instances satisfying a given set of integrity 
constraints. Therefore, we will check whether the restriction >c\r is empty for 
every instance r satisfying the given set of integrity constraints. 

Definition 5. Given a set of integrity constraints F , the operator toe is redun- 
dant w.r.t. a set of integrity constraints F ifVr G Sat(F), uic{ r ) = r - 

Theorem 1. loc is redundant w.r.t. a set of FDs F iff the following formula is 
unsatisfiable: 

f F (h,t 2 ) Aii >- c t 2 

Proof. Assume that formula in the theorem is satisfiable. Then there are tuples 
t a and tb such that ipF{t a ,tb) and t a >-c tb- Thus tb $ ^c{{t a ,tb}) and thus u>c 
is not redundant w.r.t. F. For the other direction, assume uoq is not redundant 
w.r.t. F. Then there is an instance ro £ Sat(F) and a tuple tb G ro such that 
tb £ LOc{ r o)- Thus, there must be a tuple t a in r such that t a y c tb. Clearly, 
<fiF(t a ,tb) and therefore the formula in the theorem is satisfiable. 



Theorem 1 shows that checking for redundancy w.r.t. a set of FDs F is a 
constraint satisfiability problem. 

Example 4- Consider Example 3 in which the FD ISBN — > Price holds. Then 

tp F = h = i 2 => Pl = P2 

and tpF(ti,t 2 ) A h y~ Cl h is 

{h =ii^>V\= Vi) A i\ = h A pi < p 2 . 

The last formula is clearly unsatisfiable, and thus the implication in Theorem 1 
holds and we can infer that is redundant w.r.t. ISBN — ► Price. 

How hard is it to check for redundancy w.r.t. a set of FDs Fl We assume that 
the size of a preference formula C (over a relation R) in DNF is characterized 
by two parameters: width(C) - the number of disjuncts in C, and span(C) - 
the maximum number of conjuncts in a disjunct of C. Namely, if C = D\ V 
• • • V D m , and each Di = d_\ A ■ ■ ■ C^, then width(C) = m and span{C) = 
max{/ci, . . . , k m }. 

Theorem 2. //. 

— the cardinality of the set of FDs F is \F\ and its arity is at most k; 

— the given preference relation is defined using an ipf C containing only atomic 
constraints over the same domain and such that width(C) < m, span{C) < 
n; 

— the time complexity of checking satisfiability of a conjunctive ipf with n con- 
juncts is in 0(T(n)), 

then the time complexity of checking cue for redundancy with respect to F is in 
0(m k k \ p \ T(max(fc|F|,n))). 

The paper [GSW96] contains several results about checking satisfiability of 
conjunctive formulas. For instance, in the case of rational-order formulas, this 
problem is shown to be solvable in 0(n). This implies, for example, the following 
corollary. 

Corollary 1. If a preference relation is defined by a conjunctive rational- order 
ipf (m, = 1) and the arity of F is at most 2, then checking loc for redundancy 
w.r.t. F can be done in time 0(n 2l F l) . 

An analogous result can be derived for equality formulas. From now on we will 
only present detailed complexity analysis for rational-order formulas. 



4 Weak orders 



We have defined weak orders as negatively transitive strict partial orders. Equiv- 
alently, they can be defined as strict partial orders for which the indifference 
relation is transitive. Intuitively, a weak order consists of a number (perhaps 
infinite) of linearly ordered layers. In each layer, all the elements are mutually 
indifferent and they are all above all the elements in lower layers. 

Example 5. In the preference relation >a in Example 3, the first, second and 
third tuples are indifferent with the fourth and fifth tuples. However, the first tu- 
ple is preferred to the second, violating the transitivity of indifference. Therefore, 
the preference relation >-Ci is n °t a weak order. 

Example 6. A preference relation )^c f i defined as 

x yc f y = f(x) > f(y) 

for some real-valued function /, is a weak order but not a total order. 
4.1 Computing winnow 

Many algorithms for evaluating winnow are possible. However, we discuss here 
those that have a good blocking behavior and thus are capable of processing very 
large data sets. 

We first review BNL (Figure 3), a basic algorithm for evaluating winnow, 
and show that for preference relations that are weak orders a much simpler and 
more efficient algorithm is possible. BNL was proposed in [BKS01] in the context 
of skyline queries. However, [BKS01] also noted that the algorithm requires only 
the properties of strict partial orders. BNL uses a fixed amount of main memory 
(a window) . It also needs a temporary table for the tuples whose status cannot be 
determined in the current pass, because the available amount of main memory 
is limited. 

BNL keeps in the window the best tuples discovered so far (some of them 
may also be in the temporary table). All the tuples in the window are mutually 
indifferent and they all need to be kept, since each may turn out to dominate 
some input tuple arriving later. For weak orders, however, if a tuple t\ dominates 
t 2 , then any tuple indifferent to t\ will also dominate t 2 - In this case, indifference 
is an equivalence relation, and thus it is enough to keep in main memory only a 
single tuple top from the top equivalence class. In addition, one has to keep track 
of all members of that class (called the current bucket B), since they may have 
to be returned as the result of the winnow. The new algorithm WWO (Winnow 
for Weak Orders) is shown in Figure 4. 

It is clear that WWO requires only a single pass over the input. It uses 
additional memory (whose size is at most equal to the size of the input) to keep 
track of the current bucket. However, this memory is only written and read once, 
the latter at the end of the execution of the algorithm. Clearly, for weak orders 
WWO is considerably more efficient than BNL. Note that for weak orders BNL 



1. clear the window W and the temporary table F; 

2. make r the input; 

3. repeat the following until the input is empty: 

(a) for every tuple t in the input: 

— t is dominated by a tuple in W => ignore t, 

— t dominates some tuples in W =>• eliminate the domi- 
nated tuples and insert t into W, 

— iit and all tuples in W are mutually indifferent => insert 
t into W (if there is room), otherwise add t to F; 

(b) output the tuples from W that were added there when F was 
empty, 

(c) make F the input, clear the temporary table. 



Figure3. BNL: Blocked Nested Loops 



1. 


top := the first input tuple 


2. 


B := {top} 


3. 


for every subsequent tuple t in the input: 




— t is dominated by top ignore t, 




— t dominates top => top := t; B := {t} 




— t and top are indifferent => B := B U {t} 


4. 


output B 




Figure4. WWO: Weak Order Winnow 



does not simply reduce to WWO. Note also that if additional memory is not 
available, WWO can execute in a small, fixed amount of memory by using two 
passes over the input: in the first, a top tuple is identified, and in the second, all 
the tuples indifferent to it are selected. 

In [CGGL03] we proposed SFS, a more efficient variant of BNL for skyline 
queries, in which a presorting step is used. Because sorting may require more 
than one pass over the input, that approach will also be less efficient than WWO 
for weak orders. 

4.2 Relative weak orders 

Even if a preference relation >~c is not a weak order in general, its restriction to 
a specific instance or a class of instances may be a weak order, and thus WWO 
may be applied to the computation of winnow. Again, we are going to consider 
the class of instances Sat(F) for a set of integrity constraints F. 

Definition 6. A preference relation >~c is a weak order relative to a set of 
integrity constraints F i/Vr G Sat(F), >~c\r is a weak order. 



Theorem 3. An irreflexive preference relation is a weak order relative to a 
set of FDs F iff the following formula is unsatisfiable: 

¥>F(ti,t 2 ) A (pF{t2,h) A ip F (t 1} t 3 ) A ti )~ c h A h ~ c t 3 A t 2 ~c h- 

Example 7. Consider Example 3, this time with the 0-ary FD => ISBN. (Such 
a dependency might hold, for example, in a relation resulting from the selection 
o~isbn=c for some constant c.) Note that 

(i,v,p) ~ c (i',v',p') = i=^i'Vp = p'. 

We construct the following formula, according to Theorem 3: 

k = «2A« 2 = i 3 Aii = h^ii = h^Pi < P2/\(h ^ h^Pi = P3)A(« 2 ^ h^P2 = Pz) 

which is unsatisfiable. Therefore, >~d is a weak order relative to the FD => 
ISBN, and for every instance r satisfying this dependency, loc x i r ) can be com- 
puted using the single-pass algorithm WWO. 

Theorem 4. //. 

— the cardinality of the set of FDs F is \F\ and its arity is at most k; 

— the given preference relation is defined using an ipf C containing only atomic 
constraints over the same domain and such that width(C) < m, span{C) < 
n; 

— the time complexity of checking satisfiability of a conjunctive ipf with n con- 
juncts is in 0(T{n)), 

then the time complexity of checking whether >~c is a weak order relative to F 
is in 0(m n Am k k \ F \ T(max(fc|F|, m, n))). 

Corollary 2. If a preference relation is defined by a conjunctive rational- order 
ipf (to = 1) and the arity of F is at most 2, then then the time complexity of 
checking whether >~c is a weak order relative to F is in 0(n 5 2' F I). 

5 Propagation of integrity constraints 

The study of propagation of integrity constraints by relational operators is es- 
sential for semantic optimization of complex queries. We need to know which 
integrity constraints hold in the results of such operators. The winnow operator 
returns a subset of a given relation, thus it preserves all the functional depen- 
dencies holding in the relation. However, we also know that winnow returns a 
set of tuples which are mutually indifferent. This property can be used to derive 
new dependencies that hold in the result of winnow without necessarily holding 
in the input relation. (New dependencies can also be derived for other relational 
operators, for example selection, as in Example 7.) 



Theorem 5. Let f be an FD and >c o- n irreflexive preference relation over R. 
The following formula 

t\ ~c h A -n(pf(t 1 ,t 2 ) 
is unsatisfiable iff for every instance r of R, LUc( r ) satisfies f. 

Proof We will call the FDs satisfying the condition in Theorem 5 generated 
by >c and denote the set of all such dependencies by Gc- It is easy to show 
that Gc is closed w.r.t. FD implication. Assume / ^ Gc- Then the formula in 
the theorem is satisfiable. Assume it is satisfied by tuples t a and % (t a ^ % 
because otherwise -^ip(t ai t]f) is false). Thus r = {t a ,tb} Sat(f). But t a ^c tb, 
t a i-c t a , and t b i-c t b . Thus r = u> c (ro) & Sat(f). 

In the other direction, assume that there is an instance r such that u>c( r o) ^ 
Sat(f). By the properties of FDs, we can assume that uic( r o) consists of two 
distinct tuples t a and tb- By Proposition 1, we know that t a ^c tb- Thus the 
formula is satisfied by t a and tb- 

Example 8. Consider Example 3. Then the formula from Theorem 5 is 

(ii ^ i 2 V pi = p 2 ) A h = i 2 A pi ^ p 2 

which is clearly unsatisfiable. Thus, the FD ISBN — > Price holds in the result 
of ix>C\ - even though it might not hold in the input relation. 

Theorem 6. If: 

— the arity of f is k; 

— the given preference relation is defined using an ipf C containing only atomic 
constraints over the same domain and such that width(C) < m, span{C) < 
n; 

— the time complexity of checking satisfiability of a conjunctive ipf with n con- 
juncts is in 0(T{n)), 

then the time complexity of checking checking the condition in Theorem 5 is in 
0(kn 2m T(max(fc,TO))). 

Corollary 3. If a preference relation is defined by a conjunctive rational- order 
ipf (m = 1) and the arity of f is at most 2, then the time complexity of checking 
the condition in Theorem 5 is in 0(n 2 ). 

6 Constraint-generating dependencies 

Functional dependencies are a special case of constraint- generating dependencies 
[BCW99] . 

Definition 7. A constraint-generating dependency (CGD) can be expressed a 
formula of the following form: 

Vti. . . . Vt„. [R(ti) A • • • A R(t n ) A 7(ti, . . . t n )] =► j'(t u . . . t n ) 

where j(t\, . . . t n ) and "f'(ti, . . . t n ) are constraints over some constraint theory. 



CGDs are equivalent to denial constraints. 

Example 9. We give here some examples of CGDs. Consider the relation Emp 
with attributes Name, Salary, and Manager, with Name being the primary key. 
The constraint that no employee can have a salary greater that that of her man- 
ager is a CCD: 

Vn, s, m, s' , m! . [Emp(n, s, m) A Emp(m, s' , m')] s < s' . 

Similarly, single-tuple constraints (CHECK constraints in SQL2) are a special case 
of CGDs. For example, the constraint that no employee can have a salary over 
$200000 is expressed as: 

Mn,s,m. Emp(n,s,m) => s < 200000]. 

It turns out that the problems studied in the present paper can be viewed 
as specific instances of the entailment (implication) of CGDs. To see that, let's 
define two special CGDs d 2 and d 3 c for a given preference relation >~c (and the 
corresponding indifference relation ^c)- 

d£ = Vii.Vt 2 - R(h) A R(t 2 ) h ~ c t 2 

and 

df = Vti.W 2 .Vi 3 . R(h) A R{t 2 ) A R{U) -i(ti y c t 2 A h ~ c t 3 A t 2 ~ c h). 

Then we have the following properties that generalize Theorems 1, 3, and 5. 

Theorem 7. ujc is redundant w.r.t. a set of CGDs F iff F entails d 2 . 

Theorem 8. // >~c is irreflexive, then is a weak order relative to a set of 
CGDs F iff F entails d 3 c . 

Theorem 9. If>c is irreflexive, then a CGD f is entailed by d 2 iff for every 
instance r of R, ujc{r) satisfies f . 

Example 10. Consider the following preference relation where a is a selec- 
tion condition over the schema R: 

ti y Ca h ee a{ti) A -.a(t 2 ). 

This is a very common preference relation expressing the preference for the tuples 
satisfying some property over those that do not satisfy it. The corresponding 
indifference relation ^c^is defined as follows: 

h ~c a h = a(ti) A a(t 2 ) V -na(ti) A ^a(t 2 ). 

Theorem 7 implies that uJc a is redundant w.r.t. a set of CGDs F iff F implies 
the CGD 

Wi.Vt 2 . R{h) A R{t 2 ) a(tt) A a(t 2 ) V -.a(ti) A ^a{t 2 ). 

The latter dependency is satisfied by an instance r of R if and only if all the 
tuples in r satisfy a or none does. In both cases tuc a (r) — r. 



The paper [BCW99] contains an effective reduction using symmetrization 
from entailment of CGDs to validity of V-formulas in the underlying constraint 
theory. (A similar construction using symbol mappings is presented in [Z097].) 
This immediately gives the decidability of the problems discussed in the present 
paper for equality and rational-order constraints (as well as other constraint 
theories for which satisfiablity of quantifier- free formulas is decidable). A more 
detailed complexity analysis can be carried out along the lines of Theorems 2, 
4, and 6. 

For theorems 7,8 and 9 to hold for a class of integrity constraints, two con- 
ditions need to be satisfied: (a) the class should be able to express constraints 
equivalent to d£ and d£ , and (b) the notions of entailment and finite entailment 
(entailment on finite relations) for the class should coincide. If (b) is not satis- 
fied, then the theorems will still hold if reformulated by replacing "entailment" 
with "finite entailment". Thus, assuming that (a) is satisfied, the effectiveness 
of checking the preconditions of the above theorems depends on the decidability 
of finite entailment for the given class of integrity constraints. 

7 Related work 

The basic reference for semantic query optimization is [CGM90] . The most com- 
mon techniques arc: join elimination/introduction, predicate elimination and 
introduction, and detecting an empty answer set. [CGK+99] discusses the im- 
plementation of predicate introduction and join elimination in an industrial 
query optimizer. Semantic query optimization techniques for relational queries 
are studied in [Z097] in the context of denial and referential constraints, and in 
[MWOO] in the context of constraint tuple-generating dependencies (a generaliza- 
tion of CGDs and classical relational dependencies). FDs are used for reasoning 
about sort orders in [SSM96]. 

Two different approaches to preference queries have been pursued in the lit- 
erature: qualitative and quantitative. In the qualitative approach, represented 
by [LL87,KG94,KKTG95,BKS01,GJM01,Cho02,Cho03,Kie02,KH02,KK02], the 
preferences between tuples in the answer to a query are specified directly, typi- 
cally using binary preference relations. In the quantitative approach [AW00,HKP01] , 
preferences are specified indirectly using scoring functions that associate a nu- 
meric score with every tuple of the query answer. Then a tuple t\ is preferred 
to a tuple t<2, iff the score of t\ is higher than the score of ti. The qualitative 
approach is strictly more general than the quantitative one, since one can define 
preference relations in terms of scoring functions However, not every intuitively 
plausible preference relation can be captured by scoring functions. 

Example 11. There is no scoring function that captures the preference relation 
described in Example 1. Since there is no preference defined between any of the 
first three tuples and the fourth one, the score of the fourth tuple should be 
equal to all of the scores of the first three tuples. But this implies that the scores 
of the first three tuples are the same, which is not possible since the second tuple 
is preferred to the first one which in turn is preferred to the third one. 



This lack of expressiveness of the quantitative approach is well known in util- 
ity theory [Fis99,Fis70]. The importance of weak orders in this context comes 
from the fact that only weak orders can be represented using real- valued scoring 
functions (and for countable domains this is also a sufficient condition for the 
existence of such a representation [Fis70]). In the present paper we do not as- 
sume that preference relations are weak orders. We only characterize a condition 
under which preference relations become weak orders relative to a set of integrity 
constraints. 

Algebraic optimization of preference queries is discussed in [Cho03,KH02,KH03]. 
8 Conclusions and further work 

We have presented some techniques for semantic optimization of preference 
queries, focusing on the winnow operator. The simplicity of our results attests to 
the power of logical formulation of preference relations. However, our results are 
applicable not only to the original logical framework of [Cho02,Cho03], but also 
to preference queries defined using preference constructors [Kie02,KK02] and 
skyline queries [BKS01,CGGL03,KRR02,PTFS03] because those queries can be 
expressed using preference formulas. 

Further work can address, for example, the following issues: 

— identifying other semantic optimization techniques for preference queries, 

— expanding the class of integrity constraints by considering, e.g., tuple-generating 
dependencies and referential integrity constraints, 

— identifying weaker but easier to check sufficient conditions for the application 
of our techniques, 

— considering other preference-related operators like ranking [Cho03] . 
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